Organizational Behavior: Geographic Distribution of Agricultural Units in Peru
I conducted a comprehensive descriptive analysis using maps in RStudio, leveraging data from the 2019 National Agricultural Survey (ENA). The dataset was sourced from the National Institute of Statistics and Informatics (INEI) repository, providing a rich foundation for insightful spatial exploration and visualization.
Objective: To visualize the geographic distribution of the agricultural units that participated in the 2019 National Agricultural Survey (ENA 2019) according to production size.
Methodology
Data: INEI
Survey: ENA 2019
Data preprocessing
#install.packages("mapsPERU")library(plotly)library(mapsPERU)library(ggplot2)library(tidyverse)library(ggrepel)library(dplyr) #para utilizar mutatelibrary(readr)library(haven)library(leaflet)library(leaflet.extras)library(rworldxtra)library(raster)library(sf)library(tidyverse)#Extraemos las coordenadas del paquete mapsPERU distritodf <- map_DIST#Cargamos nuestra base de datoscap1200 <- haven::read_sav("data/20_Cap1200.sav")#Visualizamos la base de datosdf
We notice that the columns DEPARTMENTO PROVINCIA DISTRITO are in lower case and with tilde while the database cap1200 is in upper case and without tilde. cap1200 database are in uppercase and without tilde. We have to homogenize them.
We convert all variables that have lowercase characters to uppercase.
# Rename the variable NOMBREDI of the base cap1200cap1200 <-rename(cap1200, DISTRITO = NOMBREDI)#Rename the categories of the size of agricultural unitsCod_tipo <-c(`1`="Pequeña y mediana UA",`2`="Grande UA")cap1200$CODIGO <-as.factor(cap1200$CODIGO)names(cap1200)
cap1200 <- cap1200 %>%mutate(CODIGO =recode_factor(CODIGO,!!!Cod_tipo))#We integrate both databases through DISTRITOENA2019 <-left_join(df, cap1200, by ="DISTRITO")names(ENA2019)
2. We add the coordinates of all Agricultural units.
leaflet() %>%addTiles() %>%addCircles(data = ENA2019, lat =~coords_y, lng =~coords_x)
3. We have added colors to identify the size of the agricultural units.
## to generate colors#number of types of producersNumber_tpp <- ENA2019$CODIGO %>%unique() %>%length()#Species nametpp_Names <- ENA2019$CODIGO %>%unique()## The colors of the sizes of the agricultural units will be:Colores <-c('#e41a1c', '#377eb8', '#4daf4a')table(ENA2019$CODIGO)
Pequeña y mediana UA Grande UA
33857 1847
#Linking the color palette to name typespal <-colorFactor(Colores, domain = tpp_Names)##Map with colors. fillopacity is transparencyleaflet() %>%addTiles() %>%addCircles(data = ENA2019, lat =~coords_y, lng =~coords_x, color =~pal(CODIGO), fillOpacity =0.5)
4. We add labels to agricultural units.
#labelsp <-leaflet() %>%addTiles() %>%addCircles(data = ENA2019, lat =~coords_y, lng =~coords_x, color =~pal(CODIGO),fillOpacity =0.5, label =~CODIGO, group ="Codigo")p
5. We generate a legend.
## Generar una leyendap <- p %>%addLegend(data = ENA2019, "bottomright", pal = pal, values =~CODIGO, title ="Tipos de productores", opacity =0.8, group ="Leyenda")p
6. We add layers according to the aspects of interest.
It can be observed that the majority of agricultural units in Peru have small and medium production sizes. In addition, they are mostly located in the central region of Peru.
Source Code
---title: "Organizational Behavior: Geographic Distribution of Agricultural Units in Peru"date: 2024-01-30description: "I conducted a comprehensive descriptive analysis using maps in RStudio, leveraging data from the 2019 National Agricultural Survey (ENA). The dataset was sourced from the National Institute of Statistics and Informatics (INEI) repository, providing a rich foundation for insightful spatial exploration and visualization."image: "img/units.jpg"categories: - maps - r - INEI - agricultural# code-fold: true# code-tools: true---# Introduction<!-- Adjust size using width and height attributes --><img src="img/units.jpg" alt="Agricultural Unit" width="350" height="500" style="float:right; margin: 0 0 10px 10px;">**Objective:** To visualize the geographic distribution of the agricultural units that participated in the 2019 National Agricultural Survey (ENA 2019) according to production size.# Methodology**Data:** INEI**Survey:** ENA 2019**Data preprocessing**```{r, message=FALSE, warning=FALSE}#install.packages("mapsPERU")library(plotly)library(mapsPERU)library(ggplot2)library(tidyverse)library(ggrepel)library(dplyr) #para utilizar mutatelibrary(readr)library(haven)library(leaflet)library(leaflet.extras)library(rworldxtra)library(raster)library(sf)library(tidyverse)#Extraemos las coordenadas del paquete mapsPERU distritodf <- map_DIST#Cargamos nuestra base de datoscap1200 <- haven::read_sav("data/20_Cap1200.sav")#Visualizamos la base de datosdfcap1200```We notice that the columns DEPARTMENTO PROVINCIA DISTRITO are in lower case and with tilde while the database cap1200 is in upper case and without tilde. cap1200 database are in uppercase and without tilde. We have to homogenize them.We convert all variables that have lowercase characters to uppercase.```{r, message=FALSE, warning=FALSE}df <- mutate_if(df, is.character, toupper)df```We remove accents from capital letters.```{r, message=FALSE, warning=FALSE}df$DEPARTAMENTO <- chartr('Á,É,Í,Ó,Ú','A,E,I,O,U', df$DEPARTAMENTO)df$PROVINCIA <- chartr('Á,É,Í,Ó,Ú','A,E,I,O,U', df$PROVINCIA)df$DISTRITO <- chartr('Á,É,Í,Ó,Ú','A,E,I,O,U', df$DISTRITO)``````{r, message=FALSE, warning=FALSE}# We display the name of the variablesnames(cap1200)# Rename the variable NOMBREDI of the base cap1200cap1200 <- rename(cap1200, DISTRITO = NOMBREDI)#Rename the categories of the size of agricultural unitsCod_tipo <- c(`1`="Pequeña y mediana UA", `2`="Grande UA")cap1200$CODIGO <- as.factor(cap1200$CODIGO)names(cap1200)cap1200 <- cap1200 %>% mutate(CODIGO = recode_factor(CODIGO,!!!Cod_tipo))#We integrate both databases through DISTRITOENA2019 <- left_join(df, cap1200, by = "DISTRITO")names(ENA2019)```# Results**1. We create the map base**```{r, message=FALSE, warning=FALSE}leaflet() %>% addTiles()```**2. We add the coordinates of all Agricultural units.**```{r, message=FALSE, warning=FALSE}leaflet() %>% addTiles() %>% addCircles(data = ENA2019, lat = ~coords_y, lng = ~coords_x)```**3. We have added colors to identify the size of the agricultural units.**```{r, message=FALSE, warning=FALSE}## to generate colors#number of types of producersNumber_tpp <- ENA2019$CODIGO %>% unique() %>% length()#Species nametpp_Names <- ENA2019$CODIGO %>% unique()## The colors of the sizes of the agricultural units will be:Colores <- c('#e41a1c', '#377eb8', '#4daf4a')table(ENA2019$CODIGO)#Linking the color palette to name typespal <- colorFactor(Colores, domain = tpp_Names)##Map with colors. fillopacity is transparencyleaflet() %>% addTiles() %>% addCircles(data = ENA2019, lat = ~coords_y, lng = ~coords_x, color = ~pal(CODIGO), fillOpacity = 0.5)```**4. We add labels to agricultural units.**```{r, message=FALSE, warning=FALSE}#labelsp <- leaflet() %>% addTiles() %>% addCircles(data = ENA2019, lat = ~coords_y, lng = ~coords_x, color = ~pal(CODIGO), fillOpacity = 0.5, label = ~CODIGO, group = "Codigo")p```**5. We generate a legend.**```{r, message=FALSE, warning=FALSE}## Generar una leyendap <- p %>% addLegend(data = ENA2019, "bottomright", pal = pal, values = ~CODIGO, title = "Tipos de productores", opacity = 0.8, group = "Leyenda")p```**6. We add layers according to the aspects of interest.**```{r, message=FALSE, warning=FALSE}## Seleccionar capasp <- p %>% addLayersControl(overlayGroups = c("Codigo", "Leyenda"), options = layersControlOptions(collapsed = F))p```# ConclusionsIt can be observed that the majority of agricultural units in Peru have small and medium production sizes. In addition, they are mostly located in the central region of Peru.